|
Starting date of the project : December 2006
|
Month of the Year |
Progress |
December-06
| A detailed study of the existing Gurmukhi OCR has been made and its limitations and areas of improvement have been noted. The following observations have been made about the present Gurmukhi OCR:
Strengths:
Limitations:
Susceptible to noise
Works well only on clean documents
Touching consonants not recognized
Does not recognize digits and some special symbols
Works only for single column text.
Work initiated for development of Corpus for training and testing the OCR.
|
January-07
| Twenty five books representing different fonts, time periods, publishers and print quality identified for development of Corpus. Around 1000 pages scanned for the corpus.
Segmentation algorithms for overlapping text lines and merged characters being developed.
|
|
|